Develop upstream sync 251224 #3170

mmakevic-amd · 2025-12-24T15:16:02Z

Motivation

Bi-weekly sync from TensorFlow upstream

Disabled tests:

I reviewed old disabled UTs; some were enabled, and some were moved to the testing scripts excluded list. All details in https://github.com/ROCm/frameworks-internal/issues/14968

Submission Checklist

[ x ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

PiperOrigin-RevId: 846167560

…intExpression. Helps with narrowing down which constraints are unsat. There can be many constraints (e.g. WGMMA in Mosaic), and while debugging it's unclear which one is violated at a glance. As a follow up, we can also introduce names to each Constraint to make the identification even easier. PiperOrigin-RevId: 846168559

PiperOrigin-RevId: 846171859

PiperOrigin-RevId: 846173555

…TF normalization in emitters 0) Fix a bug (?) in normalization util when normalized dim contains a single dimension 1) Perform normalization OTF for Transpose emitter selection 2) Use normalized shape for unrolling decision in kLoop emitter 3) Use normalized shape to detect slow transposes in triton fusion rewriter PiperOrigin-RevId: 846191206

…t.cc This change updates custom_call_test.cc to dynamically register custom call targets and FFI handlers using the runtime-determined platform name (CUDA or ROCM). This replaces the use of static registration macros, allowing the tests to run correctly across different GPU platforms and the reference interpreter. This way we can avoid compile time branches like `#ifdef GOOGLE_CUDA` and similar. Also: 1. Converts usage of raw CUDA driver API functions to StreamExecutor functionality 2. Replaces some legacy CustomCalls by FFI 3. Converts the while test target to HloRunnerPjRt 4. Removes a test case from the Token tests with a nested type in the output type, since that's not supported by our PjRt implementation. PiperOrigin-RevId: 846196106

The `fd.Size()` check doesn't work when the file descriptor is invalid and only the path was given. PiperOrigin-RevId: 846207406

PiperOrigin-RevId: 846213195

PiperOrigin-RevId: 846214738

PiperOrigin-RevId: 846217449

PiperOrigin-RevId: 846221230

PiperOrigin-RevId: 846221752

The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete. PiperOrigin-RevId: 846226180

PiperOrigin-RevId: 846226345

PiperOrigin-RevId: 846231902

PiperOrigin-RevId: 846234559

PiperOrigin-RevId: 846238886

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846246070

This change moves the definition of `AotCompilationResult` into a new header file `compiled_module.h` and renames the class to `CompiledModule`. `CompilationResult` would have been the preferred name, but it's already in-use elsewhere. The original `AotCompilationResult` is kept as a deprecated alias. PiperOrigin-RevId: 846246415

…ests, rather than on the original dimensions. These are simpler both to write and to think about. No behavior changes are intended. PiperOrigin-RevId: 846253300

PiperOrigin-RevId: 846257722

… its allocation later Imported from GitHub PR openxla/xla#35510 📝 Summary of Changes Initialize collectives pointer to nullptr 🎯 Justification Gpu runtime options are initialized in TF and transferred to XLA to execute thunks. Since the memory is not cleared collectives point to an uninitialized memory resulting in segfault during nccl collective initialization and operation. 🚀 Kind of Contribution Please remove what does not apply: 🐛 Bug Fix, Copybara import of the project: -- 2bfc6fbddbf2f9a926dd504169c56be45d2f1a0a by Harsha HS <[email protected]>: [ROCm] Initialze collectives to nullptr to force its allocation later Merging this change closes tensorflow#35510 PiperOrigin-RevId: 846266642

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846268375

…utor_test. The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc. PiperOrigin-RevId: 846269233

Imported from GitHub PR openxla/xla#35482 Sometime json incorrectly parse compile commands from bazel, and we end up passing them as ``` "-isystem path/to/includes" ``` to `clangd`, and these flags parsed incorrectly Copybara import of the project: -- adf291e21b098d79fa3be4065ee02fafdf5c660a by Eugene Zhulenev <[email protected]>: Correctly generate compile_commands.json Merging this change closes tensorflow#35482 PiperOrigin-RevId: 846269357

Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and invalid file name. PiperOrigin-RevId: 846275995

…iguous send/recv buffers Imported from GitHub PR openxla/xla#35463 With latest NCCL we can use `ncclAlltoall` API directly without having to launch grouped send and recv operations. Copybara import of the project: -- 0630f4d48049b211442dcb1754e521a4b1f37f7b by Eugene Zhulenev <[email protected]>: [xla:gpu] Support ncclAlltoall directly for contiguous send/recv buffers Merging this change closes tensorflow#35463 PiperOrigin-RevId: 846277559

…is supported by libraries. PiperOrigin-RevId: 846299624

We can add output pointer to StreamState and it will have all the information for rendezvour. No need to have a separate RendezvousValue struct. PiperOrigin-RevId: 846313928

For example if we have a fusion ``` dot bitcast1 ... bad_op ... bitcast2 ... ROOT root = ... ``` we can still benefit from sinking bitcast2 even though instructions between dot and bad_op will not change. PiperOrigin-RevId: 846314341

PiperOrigin-RevId: 848393091

PiperOrigin-RevId: 848423026

PiperOrigin-RevId: 848429925

PiperOrigin-RevId: 848434764

PiperOrigin-RevId: 848441651

…stub. The `xtile_compiler` target now acts as a selector, depending on either `xtile_compiler_impl` or `xtile_compiler_stub` based on whether CUDA or ROCm is configured. The full implementation is moved to the new `xtile_compiler_impl` target, while `xtile_compiler_stub` provides a minimal version for other configurations. This has the advantage that build_cleaner can run on xtile_compiler_impl. (Doing that removed around 20 dependencies) PiperOrigin-RevId: 848442213

PiperOrigin-RevId: 848455572

PiperOrigin-RevId: 848467225

PiperOrigin-RevId: 848467272

PiperOrigin-RevId: 848475361

It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here. PiperOrigin-RevId: 848523186

PiperOrigin-RevId: 848534440

…sync-251224

i-chaochen · 2026-01-05T10:29:47Z

This test is failed, seems backend config (h100_sxm) is incorrect

@local_xla//xla/tools:xla_gpu_compile_lib_test_amdgpu_any                FAILED in 13.4s

[2025-12-30T13:08:37.673Z] [ RUN      ] XlaCompileLibTest.CompilesForGpuWithoutDevice
[2025-12-30T13:08:37.673Z] external/local_xla/xla/tools/xla_gpu_compile_lib_test.cc:80: Failure
[2025-12-30T13:08:37.673Z] Value of: (tsl::ReadTextProto(tsl::Env::Default(), target_config_path, &target_config))
[2025-12-30T13:08:37.673Z] Expected: is OK
[2025-12-30T13:08:37.673Z]   Actual: NOT_FOUND: /root/.cache/bazel/_bazel_root/f14ffb85b056b92f87114ec3419b920b/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/tools/xla_gpu_compile_lib_test_amdgpu_any.runfiles/org_tensorflow/xla/backends/gpu/target_config/specs/h100_sxm.txtpb; No such file or directory (of type absl::lts_20250814::Status)
[2025-12-30T13:08:37.673Z] 
[2025-12-30T13:08:37.673Z] [  FAILED  ] XlaCompileLibTest.CompilesForGpuWithoutDevice (0 ms)

…ipts

…script

mmakevic-amd · 2026-01-13T00:12:20Z

This test is failed, seems backend config (h100_sxm) is incorrect

@local_xla//xla/tools:xla_gpu_compile_lib_test_amdgpu_any                FAILED in 13.4s

[2025-12-30T13:08:37.673Z] [ RUN      ] XlaCompileLibTest.CompilesForGpuWithoutDevice
[2025-12-30T13:08:37.673Z] external/local_xla/xla/tools/xla_gpu_compile_lib_test.cc:80: Failure
[2025-12-30T13:08:37.673Z] Value of: (tsl::ReadTextProto(tsl::Env::Default(), target_config_path, &target_config))
[2025-12-30T13:08:37.673Z] Expected: is OK
[2025-12-30T13:08:37.673Z]   Actual: NOT_FOUND: /root/.cache/bazel/_bazel_root/f14ffb85b056b92f87114ec3419b920b/execroot/org_tensorflow/bazel-out/k8-opt/bin/external/local_xla/xla/tools/xla_gpu_compile_lib_test_amdgpu_any.runfiles/org_tensorflow/xla/backends/gpu/target_config/specs/h100_sxm.txtpb; No such file or directory (of type absl::lts_20250814::Status)
[2025-12-30T13:08:37.673Z] 
[2025-12-30T13:08:37.673Z] [  FAILED  ] XlaCompileLibTest.CompilesForGpuWithoutDevice (0 ms)

This is a deviceless test, the problem was in file path. Fixed in 3a69036

mmakevic-amd · 2026-01-14T10:56:41Z

Hi @i-chaochen can we merge this?

i-chaochen

Thanks!

Yes, please merge it and be remember to push the tag.

tensorflower-gardener and others added 30 commits December 18, 2025 02:55

Automated Code Change

fe216f0

PiperOrigin-RevId: 846167560

Automated Code Change

3580807

PiperOrigin-RevId: 846171859

Automated Code Change

9024ef1

PiperOrigin-RevId: 846173555

Add a function to check for empty/non existing files.

69cd9be

The `fd.Size()` check doesn't work when the file descriptor is invalid and only the path was given. PiperOrigin-RevId: 846207406

Update XNNPack version

08d6df5

PiperOrigin-RevId: 846213195

Automated Code Change

f17984d

PiperOrigin-RevId: 846214738

Automated Code Change

d5820b3

PiperOrigin-RevId: 846217449

When opening a file, check that the file path is not null.

d393372

PiperOrigin-RevId: 846221230

Automated Code Change

6286fcc

PiperOrigin-RevId: 846221752

Remove forgotten ROCM version checks from NcclCollectives

b64c84f

The ROCm code path doesn't go through NcclCollectives anymore. Therefore these checks are obsolete. PiperOrigin-RevId: 846226180

Automated Code Change

17fa72a

PiperOrigin-RevId: 846226345

Automated Code Change

6457884

PiperOrigin-RevId: 846231902

Automated Code Change

5e49ee5

PiperOrigin-RevId: 846234559

[XLA:GPU] Support partitioned across replicas modules

4e34cc6

PiperOrigin-RevId: 846238886

Apply llvm-use-new-mlir-op-builder fixes

50c19ba

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846246070

[PJRT] Change the two optimizations in Transpose to operate on Loop n…

9d0d22d

…ests, rather than on the original dimensions. These are simpler both to write and to think about. No behavior changes are intended. PiperOrigin-RevId: 846253300

Reverts 408bf09

a59ffc0

PiperOrigin-RevId: 846257722

Apply llvm-use-new-mlir-op-builder fixes

434dd85

This migrates `builder.create<Op>()` => `Op::create()` PiperOrigin-RevId: 846268375

Remove unnecessary local_defines and add missing includes in gpu_exec…

fbfba09

…utor_test. The local_defines for CUDA/ROCM are not required for this test. Added explicit includes for headers used in gpu_executor_test.cc. PiperOrigin-RevId: 846269233

In FileDescriptor tests, improve temporary file path generation.

f4c5fe5

Depending on the compiler, `testing::TempDir() + __FUNCTION__` may generate and invalid file name. PiperOrigin-RevId: 846275995

[xla:cpu] Do not expand convolution feature group if the convolution …

5db58f8

…is supported by libraries. PiperOrigin-RevId: 846299624

[XLA:GPU] Use StreamState as rendezvous value.

c549ee4

We can add output pointer to StreamState and it will have all the information for rendezvour. No need to have a separate RendezvousValue struct. PiperOrigin-RevId: 846313928

[XLA:GPU] don't stop traversal when sinking bitcasts

1a5402d

For example if we have a fusion ``` dot bitcast1 ... bad_op ... bitcast2 ... ROOT root = ... ``` we can still benefit from sinking bitcast2 even though instructions between dot and bad_op will not change. PiperOrigin-RevId: 846314341

tensorflower-gardener and others added 19 commits December 23, 2025 20:54

Automated Code Change

5e56a93

PiperOrigin-RevId: 848393091

Automated Code Change

4272c47

PiperOrigin-RevId: 848423026

Fix test when it launched on the machine with 8 devices.

2b19036

PiperOrigin-RevId: 848429925

Automated Code Change

354860e

PiperOrigin-RevId: 848434764

Automated Code Change

4a2a5ae

PiperOrigin-RevId: 848441651

Automated Code Change

1a037d6

PiperOrigin-RevId: 848455572

Update GraphDef version to 2451.

c8d1b4e

PiperOrigin-RevId: 848467225

compat: Update forward compatibility horizon to 2025-12-24

6b075d4

PiperOrigin-RevId: 848467272

Automated Code Change

f77105b

PiperOrigin-RevId: 848475361

[XLA] Move xla.GpuTopology proto out of PJRT to XLA.

8e84202

It has to become a part of Compiler::CompilerOptions, but CompilerOptions should not depend on PJRT. So, moving it here. PiperOrigin-RevId: 848523186

Automated Code Change

fcc2b82

PiperOrigin-RevId: 848534440

Merge remote-tracking branch 'upstream/master' into develop-upstream-…

d040694

…sync-251224

Fix merge conflicts

5d50048

Revert 692e221

14e1429

Remove leftover diff symbols

422ffee

Fix gpu_device_info_test

1feb80c

Fix amdgpu_register_spilling_test

cd67c4f

Use googletest status assert macros patches in tf workspace2.bzl too

1787300

i-chaochen self-requested a review December 30, 2025 11:53

mmakevic-amd force-pushed the develop-upstream-sync-251224 branch from aeda463 to 9135a29 Compare January 12, 2026 23:58

Remove remaining cuda-only tags and move failing subtests to test scr…

b28eff1

…ipts

mmakevic-amd force-pushed the develop-upstream-sync-251224 branch from 9135a29 to b28eff1 Compare January 13, 2026 00:00

mmakevic-amd added 2 commits January 13, 2026 00:01

Fix device_tracer_test build error and move failing subtests to test …

c72c18e

…script

Fix xla_gpu_compile_lib_test

3a69036

i-chaochen approved these changes Jan 14, 2026

View reviewed changes

mmakevic-amd merged commit 1d673b5 into develop-upstream Jan 14, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop upstream sync 251224 #3170

Develop upstream sync 251224 #3170

Uh oh!

mmakevic-amd commented Dec 24, 2025 •

edited

Loading

Uh oh!

i-chaochen commented Jan 5, 2026

Uh oh!

mmakevic-amd commented Jan 13, 2026

Uh oh!

mmakevic-amd commented Jan 14, 2026

Uh oh!

i-chaochen left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Develop upstream sync 251224 #3170

Develop upstream sync 251224 #3170

Uh oh!

Conversation

mmakevic-amd commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Submission Checklist

Uh oh!

i-chaochen commented Jan 5, 2026

Uh oh!

mmakevic-amd commented Jan 13, 2026

Uh oh!

mmakevic-amd commented Jan 14, 2026

Uh oh!

i-chaochen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

mmakevic-amd commented Dec 24, 2025 •

edited

Loading

i-chaochen left a comment •

edited

Loading